perm filename CH5G[206,LSP] blob
sn#241026 filedate 1976-09-18 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00010 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 .REQUIRE "SETUP" SOURCE_FILE
C00007 00003 .ss Some facts about the PDP-10.
C00013 00004 .ss Code produced by LISP compilers.
C00014 00005 .BEGIN VERBATIM SELECT 1
C00017 00006 Note that all three compilers produce the same first line of
C00020 00007 .ss Listings of LCOM0 and LCOM4
C00021 00008 .SKIP TO LINE 1
C00022 00009 .SKIP TO LINE 1
C00035 00010 .SKIP TO LINE 1
C00039 ENDMK
C⊗;
.REQUIRE "SETUP" SOURCE_FILE
.NEXT SECTION
.NEXT SECTION
.sec Compiling in Lisp
.ss Introduction
Compiling is an important example of symbolic computation and
has received much study. Much of the study has been devoted
to parsing which is essentially the transformation of an input string
in the source language into an internal form. The internal form used
depends on the compiler. Sometimes it is Polish prefix or postfix notation,
sometimes it is list structure, and sometimes it consists of entries in
a collection of tables.
When internal notation LISP is being compiled, the parsing is
trivial, because the input is S-expressions and the internal form wanted
is list structure, so what parsing there is is done by the ordinary LISP
%3read%1 routine. Therefore, compilers can be very compact and
transparent in structure, and we can concentrate our attention on the code
generation phase. This is as it should be, because, in my opinion,
parsing is basically a side issue in compiling, and code generation is the
matter of main scientific interest.
We shall describe two compilers in this chapter called LCOM0 and
LCOM4 which compile S-expression LISP into machine language for the PDP-10
computer according to the conventions of Stanford LISP 1.6. For now we
shall take these conventions for granted. Before describing the
compilers, we must describe these conventions.
The target language is called LAP for LISP assembly program. Each
function is compiled separately into a separate LAP program, and these
programs are written onto a disk file. These files can later be read into
a LISP core image and assembled in place by the LAP assembler which is
part of the LISP runtime routines. The compiler, on the other hand, is a
separate LISP core image that can be instructed to compile several input
files. For an input file called <name>, it produces and output file
called <name>.LAP. All this is specific to the PDP-10 time-sharing system
and fortunately works the same whether the time-sharing system is the
D.E.C. system, the Stanford system, or TENEX.
The LAP program produced is a list of words; the last word is NIL,
and the first word is a header of the form (LAP %3fname%1 SUBR) where
%3fname%1 is the name of the function compiled. This header tells the
DSKIN program that it should read S-expressions till it comes to NIL and
then submit what it has collected to the LAP assembler which will assemble
it into core. The rest of the words are either atoms representing labels
or lists representing single PDP-10 instructions.
.ss Some facts about the PDP-10.
The following facts about the PDP-10 may be of use:
The PDP-10 has a 36 bit word and an 18 bit address. In
instructions and in accumulators used as index registers the address is
found in the right part of the word where the least significant bits in
arithmetic reside.
There are 16 general registers which serve simultaneously as
accumulators (receiving the results of arithmetic operations), index
registers (modifying the nominal addresses of instructions to form
effective addresses), and as the first 16 registers of memory (if the
effective address of an instruction is less than 16, then the
instruction uses the corresponding general register as its operand.)
All instructions have the same format and are written for the
LAP assembly program in the form
(<op name> <accumulator> <address> <index register>).
Thus #(MOVE 1 3 P)# causes accumulator #1# to receive the contents of
a memory register whose address is #3+c(P), i.e. 3+<the contents of
general register P>.
If no index register is specified, then the address used is just <address>.
If the op name is followed by #@, then the memory
reference is indirect, i.e. the effective addrress is the contents of
the right half of the word directly addressed by the instruction
(modified by the index register and indirect bit of that word). In
the following description of some instructions useful in constructing
the compiler, <ef> denotes the effective address of an instruction.
.BEGIN INDENT 8,30,8
.PREFACE 0
.AT 0 ⊂BREAK⊃
.AT 24 ⊂""⊃
.TURN ON "∂"
.VARIABLE I
.I ← 31
MOVE ∂(I)(ac) ← c(<ef>)
MOVEI ∂(I)c(ac) ← <ef>
MOVEM ∂(I)c(<ef>) ← c(ac)
HLRZ ∂(I)right half of c(ac) ← left half of c(<ef>)
HRRZ ∂(I)right half of c(ac) ← right half of c(<ef>)
∂(16)(These two are used indirectly for #car# and #cdr# respectively.)
ADD ∂(I)c(ac) ← c(ac) + c(<ef>)
SUB ∂(I)c(ac) ← c(ac) - c(<ef>)
JRST ∂(I)go to <ef>
JUMPE ∂(I)if c(ac) %2=%* 0 then go to <ef>
JUMPN ∂(I)if c(ac) %2≠%* 0 then go to <ef>
CAME ∂(I)if c(ac) %2=%* c(<ef>) then <skip next instruction>
CAMN ∂(I)if c(ac) %2≠%* c(<ef>) then <skip next instruction>
PUSH ∂(I)c(c(right half of ac)) ← c(<ef>); the contents
of each half of ac is increased by one and if
the contents of the left half is then #0, a stack
overflow interrupt occurs. (PUSH P ac) is
is used in LISP to put c(ac) on the
stack.
POP ∂(I)c(<ef>) ←c(c(right half of ac)); the
contents of each half of ac are then decreased
by 1. This may be used for removing the
top element of the stack. Thus (POP P 1) puts
the top element of the stack in accumulator 1.
The i-th element of the stack is obtained by
(MOVE@ 1 i P).
POPJ ∂(I)(POPJ P) is used for returning from a subroutine
.END
These instructions are adequate for compiling basic LISP code
with the addition of the subroutine calling pseudo-instrucion. (CALL
%3n%* (E <subr) S) is used for calling the LISP subroutine <subr> with %3n%1
arguments. The convention is that the arguments will be stored in
successive accumulators beginning with accumulator #1, and the result
will be returned in accumulator #1. In particular the functions
#ATOM# and #CONS# are called with #(CALL 1 (E ATOM) S)# and (CALL 2 (E
CONS) S) respectively, and programs produced by you compiler will be
called similarly when their names are referred to.
.ss Code produced by LISP compilers.
We will discuss two compilers, a simple one called LCOM0 and a
more optimising compiler called LCOM4. LCOM4 produces about half as many
instructions for a given function as does LCOM0. Besides these, there is
the standard PDP-10 LISP compiler written at M.I.T. by Richard Greenblatt
and Stuart Nelson and modified by Whitfield Diffie.
.SKIP TO LINE 1
.BEGIN VERBATIM SELECT 1
Examples of output of LCOM0, LCOM4, and the regular Lisp compiler
For the file containing:
(DEFPROP DROP
(LAMBDA(X)
(COND ((NULL X) NIL) (T (CONS (LIST (CAR X)) (DROP (CDR X))))))
EXPR)
******************************************************************
LCOM0 procuces the following code:
(LAP DROP SUBR)
(PUSH P 1)
(MOVE 1 0 P)
(PUSH P 1)
(MOVE 1 0 P)
(SUB P (C 1 0 1 0))
(CALL 1 (E NULL) S)
(JUMPE 1 G0163)
(MOVEI 1 0)
(JRST G0162)
G0163
(MOVEI 1 (QUOTE T))
(JUMPE 1 G0164)
(MOVE 1 0 P)
(PUSH P 1)
(MOVE 1 0 P)
(SUB P (C 1 0 1 0))
(CALL 1 (E CAR) S)
(PUSH P 1)
(MOVE 1 0 P)
(SUB P (C 1 0 1 0))
(CALL 1 (E LIST) S)
(PUSH P 1)
(MOVE 1 -1 P)
(PUSH P 1)
(MOVE 1 0 P)
(SUB P (C 1 0 1 0))
(CALL 1 (E CDR) S)
(PUSH P 1)
(MOVE 1 0 P)
(SUB P (C 1 0 1 0))
(CALL 1 (E DROP) S)
(PUSH P 1)
(MOVE 1 -1 P)
(MOVE 2 0 P)
(SUB P (C 2 0 2 0))
(CALL 2 (E CONS) S)
(JRST G0162)
G0164
G0162
(SUB P (C 1 0 1 0))
(POPJ P)
NIL
LCOM4 produces the following code:
(LAP DROP SUBR)
(PUSH P 1)
(MOVE 1 0 P)
(JUMPE 1 G0162)
(HLRZ@ 1 0 P)
(CALL 1 (E LIST) S)
(PUSH P 1)
(HRRZ@ 1 -1 P)
(CALL 1 (E DROP) S)
(MOVE 2 1)
(MOVE 1 0 P)
(SUB P (C 1 0 1 0))
(CALL 2 (E CONS) S)
G0162
(SUB P (C 1 0 1 0))
(POPJ P)
NIL
******************************************************************
ILISP COMPILER produces the following code:
(LAP DROP SUBR)
(PUSH P 1)
(JUMPE 1 TAG1)
(HLRZ@ 1 0 P)
(CALL 1 (E NCONS) S)
(PUSH P 1)
(HRRZ@ 1 -1 P)
(CALL 1 (E DROP) S)
(POP P 2)
(CALL 2 (E XCONS) S)
TAG1 (SUB P (C 1 0 1 0))
(POPJ P)
NIL
.END
Note that all three compilers produce the same first line of
heading. This is necessary for the LAP assembly program which also
requires the #NIL# at the end as punctuation. The standard compiler
uses a function called #XCONS# that has the same effect as CONS#
except that it receives its arguments in the reverse order which is
sometimes convenient. This requires, of course, that the compiler be
smart enough to decide where it is more convenient to put the
arguments of the %3cons%1 function.
My two compilers are similar in structure, but the better
one, which is twice as long, uses the following optimisations.
1. When the argument of CAR or CDR is a variable, it compiles
a (HLRZ@ 1 i P) or (HRRZ@ 1 i P) which gets the result through the
stack without first compiling the argument into an accumulator.
2. When it has to set up the arguments of a function in the
accumulators, in general, the program must compute the arguments one
at a time and save them on the stack, and then load the accumulators
from the stack. However, if one of the arguments is a variable, is a
quoted expression, or can be obtained from a variable by a chain of
#cars# and #cdrs, then it need not be computed until the time of
loading accumulators since it can be computed using only the
accumulator in which it is wanted.
3. The Diffy compiler produces about 10 percent less code
than LCOM4; the main difference seems to be that it sometimes notices
when it has an expression that it will need later. Sometimes this
feature leads it astray. For example, it may save #%4a %3u%1#
for later use at greater cost than re-computing it.
It should further be noted that quoted S-expressions can be
compiled with the instruction
(MOVEI 1 (QUOTE α)).
.ss Listings of LCOM0 and LCOM4
First are listings of both compilers in blackboard notation.
Following these is an an annotated listing of LCOM0 in S-expression
notation, and a listing of LCOM4 in that notation.
.SKIP TO LINE 1
LCOM0:
.REQUIRE "LC0.BB" SOURCE_FILE
.SKIP TO LINE 1
LCOM4:
.REQUIRE "LC4.BB" SOURCE_FILE
.SKIP TO LINE 1
Annotated listing of LCOM0:
.BEGIN VERBATIM SELECT 1
(DEFPROP LC0FNS
(LC0FNS COMPL
COMP
PRUP
MKPUSH
COMPEXP
COMPLIS
LOADAC
COMCOND
COMBOOL
COMPANDOR)
VALUE)
~COMPL is the user-callable driver. It is a FEXPR. It takes as
~ an argument a single file name, which must have no extension.
~ EXPRs on a file called FILNAM will be compiled into LAP and
~ written on the file FILNAM.LAP. Other types of function
~ definitions and non-definitions are simply copied to output.
(DEFPROP COMPL
(LAMBDA(FILE)
(PROG (Z)
(EVAL
(CONS (QUOTE OUTPUT)
(CONS (QUOTE DSK:)
(LIST (CONS (CAR FILE) (QUOTE LAP))))))
(EVAL (CONS (QUOTE INPUT) (CONS (QUOTE DSK:) FILE)))
(INC T NIL)
LOOP (SETQ Z (ERRSET (READ)))
(COND ((ATOM Z) (GO DONE)))
(SETQ Z (CAR Z))
(COND ((OR (EQ (CAR Z) (QUOTE DE))
(AND (EQ (CAR Z) (QUOTE DEFPROP))
(EQ (CADDDR Z) (QUOTE EXPR))))
(PROG (PROG)
(SETQ PROG
(COND ((EQ (CAR Z) (QUOTE DE))
(COMP (CADR Z)
(CADDR Z)
(CADDDR Z)))
(T
(COMP (CADR Z)
(CADR (CADDR Z))
(CADDR (CADDR Z))))))
(OUTC T NIL)
(MAPC (FUNCTION PRINT) PROG)
(OUTC NIL NIL)
(PRINT (LIST (CADR Z) (LENGTH PROG)))))
(T (OUTC T NIL) (PRINT Z) (OUTC NIL NIL)))
(GO LOOP)
DONE (OUTC T NIL)
(OUTC NIL T)
(INC NIL T)
(RETURN (QUOTE ENDCOMP))))
FEXPR)
~COMP compiles a single function definition, returning a list of
~ the LAP code corresponding to the definition.
~ FN is the atomic name of the function being compiled.
~ VARS is the formal parameter list for the function.
~ EXP is the function body.
(DEFPROP COMP
(LAMBDA(FN VARS EXP)
((LAMBDA(N)
(APPEND (LIST (LIST (QUOTE LAP) FN (QUOTE SUBR)))
(MKPUSH N 1)
(COMPEXP EXP (MINUS N) (PRUP VARS 1))
(LIST
(LIST (QUOTE SUB) (QUOTE P) (LIST (QUOTE C) N 0 N 0)))
(QUOTE ((POPJ P) NIL))))
(LENGTH VARS)))
EXPR)
~PRUP returns an A-LIST formed by pairing successive elements of
~ VARS with consecutive integers beginning with N.
(DEFPROP PRUP
(LAMBDA(VARS N)
(COND ((NULL VARS) NIL)
(T (CONS (CONS (CAR VARS) N) (PRUP (CDR VARS) (PLUS N 1))))))
EXPR)
~MKPUSH returns a list of N (PUSH P i) instructions, where i runs
~ from M to M+N-1. Used to push arguments onto the stack.
(DEFPROP MKPUSH
(LAMBDA(N M)
(COND ((LESSP N M) NIL)
(T
(CONS (LIST (QUOTE PUSH) (QUOTE P) M)
(MKPUSH N (PLUS M 1))))))
EXPR)
~COMPEXP is the heart of LCOM0. It determines precisely
~ what an expression is, and compiles appropriate code
~ for it. It returns a list of that code.
~ EXP is the expression to be compiled.
~ M is minus the number of entries on the stack. When
~ added to a value retrieved from the A-LIST VPR, it
~ can be used to locate a variable on the stack.
~ VPR is an A-LIST, associating variable names with
~ numbers which, when added to M, give stack offsets.
~ Both M and VPR maintain these definitions throughout.
(DEFPROP COMPEXP
(LAMBDA(EXP M VPR)
(COND ((NULL EXP) (QUOTE ((MOVEI 1 0)))) ~NIL
((EQ EXP T) (QUOTE ((MOVEI 1 (QUOTE T))))) ~T
((ATOM EXP) ~variable
(LIST
(LIST (QUOTE MOVE)
1
(PLUS M (CDR (ASSOC EXP VPR)))
(QUOTE P))))
((OR (EQ (CAR EXP) (QUOTE AND)) ~boolean expression
(EQ (CAR EXP) (QUOTE OR))
(EQ (CAR EXP) (QUOTE NOT)))
((LAMBDA(L1 L2)
(APPEND
(COMBOOL EXP M L1 NIL VPR)
(LIST (QUOTE (MOVEI 1 (QUOTE T)))
(LIST (QUOTE JRST) 0 L2)
L1
(QUOTE (MOVEI 1 0))
L2)))
(GENSYM)
(GENSYM)))
((EQ (CAR EXP) (QUOTE COND)) ~COND
(COMCOND (CDR EXP) M (GENSYM) VPR))
((EQ (CAR EXP) (QUOTE QUOTE)) ~QUOTE
(LIST (LIST (QUOTE MOVEI) 1 EXP)))
((ATOM (CAR EXP)) ~function call
((LAMBDA(N)
(APPEND
(COMPLIS (CDR EXP) M VPR)
(LOADAC (DIFFERENCE 1 N) 1)
(LIST
(LIST (QUOTE SUB) (QUOTE P) (LIST (QUOTE C) N 0 N 0)))
(LIST
(LIST (QUOTE CALL)
N
(LIST (QUOTE E) (CAR EXP))
(QUOTE S)))))
(LENGTH (CDR EXP))))
((EQ (CAAR EXP) (QUOTE LAMBDA)) ~LAMBDA expression
((LAMBDA(N)
(APPEND
(COMPLIS (CDR EXP) M VPR)
(COMPEXP
(CADDAR EXP)
(DIFFERENCE M N)
(APPEND (PRUP (CADAR EXP) (DIFFERENCE 1 M)) VPR))
(LIST
(LIST (QUOTE SUB) (QUOTE P) (LIST (QUOTE C) N 0 N 0)))))
(LENGTH (CDR EXP))))
(T NIL))) ~oops
EXPR)
~COMPLIS compiles code to evaluate each expression in a list of
~ expressions and to push those values onto the stack. It
~ returns a list of that code. It is used to compile code
~ to evaluate arguments to called functions or LAMBDA expressions.
~ U is a list of expressions.
(DEFPROP COMPLIS
(LAMBDA(U M VPR)
(COND ((NULL U) NIL)
(T
(APPEND (COMPEXP (CAR U) M VPR)
(QUOTE ((PUSH P 1)))
(COMPLIS (CDR U) (DIFFERENCE M 1) VPR)))))
EXPR)
~LOADAC returns a list of (MOVE i j P) instructions, loading
~ consecutive accumulators from the top of the stack.
~ K indexes the accumulator loaded.
~ N indexes the stack offset.
(DEFPROP LOADAC
(LAMBDA(N K)
(COND ((GREATERP N 0) NIL)
(T
(CONS (LIST (QUOTE MOVE) K N (QUOTE P))
(LOADAC (PLUS N 1) (PLUS K 1))))))
EXPR)
~COMCOND compiles a COND.
~ U is a list of clauses in the COND.
~ L is a label to be emitted at the end of all code for
~ the COND.
(DEFPROP COMCOND
(LAMBDA(U M L VPR)
(COND ((NULL U) (LIST L))
(T
((LAMBDA(L1)
(APPEND (COMBOOL (CAAR U) M L1 NIL VPR)
(COMPEXP (CADAR U) M VPR)
(LIST (LIST (QUOTE JRST) L) L1)
(COMCOND (CDR U) M L VPR)))
(GENSYM)))))
EXPR)
~COMBOOL compiles code for a single predicate. That is, the
~ code generated evaluates the predicate and branches somewhere,
~ depending on the value.
~ P is the predicate.
~ L is a label which represents the branch point.
~ FLG is a flag. If FLG is NIL, code is to fall thru on non-NIL
~ result and branch to L on NIL result. If FLG is non-NIL,
~ code is to fall thru on NIL result and branch to L on
~ non-NIL result.
(DEFPROP COMBOOL
(LAMBDA(P M L FLG VPR)
(COND ((ATOM P) ~simple variable
(APPEND
(COMPEXP P M VPR)
(LIST
(LIST (COND (FLG (QUOTE JUMPN)) (T (QUOTE JUMPE))) 1 L))))
((EQ (CAR P) (QUOTE AND)) ~conjunction
(COND ((NOT FLG) (COMPANDOR (CDR P) M L NIL VPR))
(T
((LAMBDA(L1)
(APPEND (COMPANDOR (CDR P) M L1 NIL VPR)
(LIST (LIST (QUOTE JRST) 0 L))
(LIST L1)))
(GENSYM)))))
((EQ (CAR P) (QUOTE OR)) ~disjunction
(COND (FLG (COMPANDOR (CDR P) M L T VPR))
(T
((LAMBDA(L1)
(APPEND (COMPANDOR (CDR P) M L1 T VPR)
(LIST (LIST (QUOTE JRST) 0 L))
(LIST L1)))
(GENSYM)))))
((EQ (CAR P) (QUOTE NOT)) ~negation
(COMBOOL (CADR P) M L (NOT FLG) VPR))
(T ~other expression
(APPEND
(COMPEXP P M VPR)
(LIST
(LIST (COND (FLG (QUOTE JUMPN)) (T (QUOTE JUMPE)))
1
L))))))
EXPR)
~COMPANDOR compiles code for lists of predicates connected
~ conjunctively or disjunctively.
~ U is a list of predicates.
~ L is a label.
~ FLG is a flag. If FLG is NIL, we are to fall thru on non-NIL
~ results and branch to L on NIL results (AND case). If FLG
~ is non-NIL, we are to fall thru on NIL results and branch
~ to L on non-NIL results (OR case).
(DEFPROP COMPANDOR
(LAMBDA(U M L FLG VPR)
(COND ((NULL U) NIL)
(T
(APPEND (COMBOOL (CAR U) M L FLG VPR)
(COMPANDOR (CDR U) M L FLG VPR)))))
EXPR)
.END
.SKIP TO LINE 1
LCOM4:
.BEGIN VERBATIM SELECT 1
.REQUIRE "LCOM4" SOURCE_FILE
.END